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Abstract 

The novel “Volume-Enclosing Surface exTraction Al- 
gorith” (VESTA) generates triangular isosurfaces from 
computed tomography volumetric images and/or three- 
dimensional (3D) simulation data. Here, we present var¬ 
ious benchmarks for GPU-based code implementations 
of both VESTA and the current state-of-the-art Marching 
Cubes Algorithm (MCA). One major result of this study is 
that VESTA runs significantly faster than the MCA. 

Introduction 

NVIDIAs toolkits ( cf, e.g., Ref. O) for the develop¬ 
ment of CUDA®-based software contain, among many 
other things, example code for an extended version (2) of 
the original MCA El- Here, we compare the performance 
of this code with our CUD A®- and ANSI-C-based imple¬ 
mentation of VESTA 0| on a Linux-based (i.e., openSUSE 
13.1) PC with a GeForce GTX 750 Ti graphics card. 

In particular, the times that we have measured (cf., Ta¬ 
ble 1) are averages over 1000 runs each. The measurements 
start after the data sets have been loaded into texture mem¬ 
ory, and they stop after all point coordinates and triplets of 
point IDs (i.e., triangles) have been computed on the GPU. 


Technique 

Extended MCA 

Marching VESTA 

Mode 

DCED/L 

DCED/L 

Mixed / H 

(a) Points 

19,218 

12,814 

15,292 

Triangles 

6406 

6406 

11,362 

Time (ms) 

1.43(5) 

1.28(5) 

1.37(4) 

(b) Points 

6,128,724 

4,085, 840 

4, 852, 644 

Triangles 

2,042, 908 

2,042,908 

3,576,516 

Time (ms) 

98.5(1) 

71.3(1) 

75.9(4) 

(c) Points 

5, 566, 998 

3, 699, 086 

4, 346,120 

Triangles 

1,855,666 

1,855,666 

3,147, 604 

Time (ms) 

23.0(1) 

18.7(1) 

22.4(1) 

(d) Points 

33,240 

22,208 

25,894 

Triangles 

11,080 

11,080 

18,350 

Time (ms) 

0.82(4) 

0.63(4) 

0.74(4) 

(e) Points 

13,859,304 

9, 267, 824 

11,178,649 

Triangles 

4,619,768 

4,619,768 

8,441,610 

Time (ms) 

111.2(1) 

84.4(1) 

94.6(6) 


Table 1: Benchmarks for various processed tomographic 
data sets: for (a) - (c), cf, Ref. ffl and Ref.s therein, (d) 
Bucky.raw data is a portion of fT), and (e) Happy Buddha 
VRI file 0. For the selected isovalues, cf, Fig. 1. 



Figure 1: VESTA high resolution “mixed” mode (Mixed/ 
H) isosurface renderings, where the isovalues equal to (a) 
139, (b) 150, (c) 135, (d) 128, and (e) 150, respectively. 

Results 

For the here considered data sets EEIE), the extended 
MCA is about (a) 12%, (b) 38%, (c) 23%, (d) 30%, and 
(e) 32%, slower than the marching variant of VESTA 0, 
when the latter is executed in its low resolution “discon¬ 
nect” mode (DCED/L). Furthermore, VESTA is also faster 
even if higher resolution isosurfaces are computed (cf, 
Fig. 1), which have about twice the number of triangles 
(cf, Table 1). 

Note that the current code implementation of VESTA 
does not yet use parallel streaming, nor it does call de¬ 
vice kernels from within kernels. As a consequence, fur¬ 
ther GPU-based code optimisations may result in an even 
faster VESTA code. 
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